AAAI.2021 - AI for Conference Organization and Delivery

Total: 5

#1 Argument Mining Driven Analysis of Peer-Reviews [PDF] [Copy] [Kimi]

Authors: Michael Fromm ; Evgeniy Faerman ; Max Berrendorf ; Siddharth Bhargava ; Ruoxia Qi ; Yao Zhang ; Lukas Dennert ; Sophia Selle ; Yang Mao ; Thomas Seidl

Peer reviewing is a central process in modern research and essential for ensuring high quality and reliability of published work. At the same time, it is a time-consuming process and increasing interest in emerging fields often results in a high review workload, especially for senior researchers in this area. How to cope with this problem is an open question and it is vividly discussed across all major conferences. In this work, we propose an Argument Mining based approach for the assistance of editors, meta-reviewers, and reviewers. We demonstrate that the decision process in the field of scientific publications is driven by arguments and automatic argument identification is helpful in various use-cases. One of our findings is that arguments used in the peer-review process differ from arguments in other domains making the transfer of pre-trained models difficult. Therefore, we provide the community with a new dataset of peer-reviews from different computer science conferences with annotated arguments. In our extensive empirical evaluation, we show that Argument Mining can be used to efficiently extract the most relevant parts from reviews, which are paramount for the publication decision. Also, the process remains interpretable, since the extracted arguments can be highlighted in a review without detaching them from their context.

#2 Uncovering Latent Biases in Text: Method and Application to Peer Review [PDF] [Copy] [Kimi]

Authors: Emaad Manzoor ; Nihar B. Shah

Quantifying systematic disparities in numerical quantities such as employment rates and wages between population subgroups provides compelling evidence for the existence of societal biases. However, biases in the text written for members of different subgroups (such as in recommendation letters for male and non-male candidates), though widely reported anecdotally, remain challenging to quantify. In this work, we introduce a novel framework to quantify bias in text caused by the visibility of subgroup membership indicators. We develop a nonparametric estimation and inference procedure to estimate this bias. We then formalize an identification strategy to causally link the estimated bias to the visibility of subgroup membership indicators, provided observations from time periods both before and after an identity-hiding policy change. We identify an application wherein “ground truth” bias can be inferred to evaluate our framework, instead of relying on synthetic or secondary data. Specifically, we apply our framework to quantify biases in the text of peer reviews from a reputed machine-learning conference before and after the conference adopted a double-blind reviewing policy. We show evidence of biases in the review ratings that serves as “ground truth”, and show that our proposed framework accurately detects the presence (and absence) of these biases from the review text without having access to the review ratings.

#3 A Market-Inspired Bidding Scheme for Peer Review Paper Assignment [PDF] [Copy] [Kimi]

Authors: Reshef Meir ; Jérôme Lang ; Julien Lesca ; Nicholas Mattei ; Natan Kaminsky

We propose a market-inspired bidding scheme for the assignment of paper reviews in large academic conferences. We provide an analysis of the incentives of reviewers during the bidding phase, when reviewers have both private costs and some information about the demand for each paper; and their goal is to obtain the best possible k papers for a predetermined k. We show that by assigning `budgets' to reviewers and a `price' for every paper that is (roughly) proportional to its demand, the best response of a reviewer is to bid sincerely, i.e., on her most favorite papers, and match the budget even when it is not enforced. This game-theoretic analysis is based on a simple, prototypical assignment algorithm. We show via extensive simulations on bidding data from real conferences, that our bidding scheme would substantially improve both the bid distribution and the resulting assignment.

#4 A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers in Large Conferences [PDF] [Copy] [Kimi]

Authors: Ivan Stelmakh ; Nihar B. Shah ; Aarti Singh ; Hal Daumé III

Conference peer review constitutes a human-computation process whose importance cannot be overstated: not only it identifies the best submissions for acceptance, but, ultimately, it impacts the future of the whole research area by promoting some ideas and restraining others. A surge in the number of submissions received by leading AI conferences has challenged the sustainability of the review process by increasing the burden on the pool of qualified reviewers which is growing at a much slower rate. In this work, we consider the problem of reviewer recruiting with a focus on the scarcity of qualified reviewers in large conferences. Specifically, we design a procedure for (i) recruiting reviewers from the population not typically covered by major conferences and (ii) guiding them through the reviewing pipeline. In conjunction with the ICML 2020 --- a large, top-tier machine learning conference --- we recruit a small set of reviewers through our procedure and compare their performance with the general population of ICML reviewers. Our experiment reveals that a combination of the recruiting and guiding mechanisms allows for a principled enhancement of the reviewer pool and results in reviews of superior quality compared to the conventional pool of reviews as evaluated by senior members of the program committee (meta-reviewers).

#5 Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment [PDF] [Copy] [Kimi]

Authors: Ivan Stelmakh ; Nihar B. Shah ; Aarti Singh

We consider the issue of strategic behaviour in various peer-assessment tasks, including peer grading of exams or homeworks and peer review in hiring or promotions. When a peer-assessment task is competitive (e.g., when students are graded on a curve), agents may be incentivized to misreport evaluations in order to improve their own final standing. Our focus is on designing methods for detection of such manipulations. Specifically, we consider a setting in which agents evaluate a subset of their peers and output rankings that are later aggregated to form a final ordering. In this paper, we investigate a statistical framework for this problem and design a principled test for detecting strategic behaviour. We prove that our test has strong false alarm guarantees and evaluate its detection ability in practical settings. For this, we design and conduct an experiment that elicits strategic behaviour from subjects and release a dataset of patterns of strategic behaviour that may be of independent interest. We use this data to run a series of real and semi-synthetic evaluations that reveal a strong detection power of our test.